Group 25 Project

André Godinho s253707
Szymon Cholewiński s253711
Magnus Strange s190867
Lily Whipple s243865
Catarina Anastácio s253709

Introduction

  • Analysis of Pancreatic Adenocarcinoma The Cancer Genome Atlas (TCGA) PanCancer data (Cerami et al. 2012; The Cancer Genome Atlas Research Network 2018)

  • Pancreatic cancers estimated deaths rank 3rd across cancers in USA in 2025 (Siegel et al. 2025) -> relevant data

  • Data contained cancer genomic data, scores and classification -> enough information to perform analysis

  • After initial analysis it didn’t look tidy -> ideal for the project purposes

Parts of image TCGA-2L-AAQI-01Z taken from (Cerami et al. 2012; The Cancer Genome Atlas Research Network 2018; David Gutman and Lee Cooper, n.d.)

Methods and Materials

Materials:

Methodology

Data Description 1: Overview of Data Partitions

  • Prior to data filtration, the total dataset contains 184 patients with 63 clinical variables.
  • For utility, the total dataset is partitioned into five categories.
  • The categories are not totally conceptually clean but provide a useful framework for analysis of clinical data.

Data Description 2: Example Patient Metadata Overview

  • The R pipeline produced a wealth of highly descriptive plots about the data.
  • This plot was chosen as an example because it (1) is early in the pipeline, (2) clarifies patient Sex and Ethnicity stratification, and (3) highlights the power of ggplots.
  • Ideally, further analysis would stratify or clean data by race/ancestry/ethnicity.

PCA on numeric data

PCA reduced the data to a few components while retaining most of the variance.

PCA loadings were plotted and revealed groups of strongly correlated variables, indicating redundancy in the feature set.

  • Some separation is observed!

Genomic Alterations Across AJCC Cancer Stages

Genomic instability is present at all AJCC stages, not just in advanced disease.

data_gene_code_filtered |>
  left_join(stage_labels, by = "NDSA_stage") |>
  pivot_longer(
    cols = c(
      `Fraction Genome Altered`,
      `Mutation Count`,
      `Tumor Break Load`
    ),
    names_to = "Genomic_Metric",
    values_to = "Value"
  ) |>
  ggplot(aes(x = stage_label, y = Value)) +
  geom_violin() +
  geom_boxplot(width = 0.2) +
  facet_wrap(~ Genomic_Metric, 
             scales = "free_y", 
             ncol = 1) +
  theme_uniform() +
  theme(axis.text.x = element_text(angle = 45, 
                                   hjust = 1)) +
  labs(
    x = "AJCC stage (n per stage)",
    y = "Genomic metric",
    title = "Genomic instability metrics across 
    AJCC stages")
  • Later stages tend to include more extreme values.
  • Overlap between neighboring stages, so stage alone doesn’t perfectly separate genomic burden.

Patient Survival based on Hypoxia and Tumor Break Load interaction

Tumor structural instability interacts with hypoxia to influence survival, highlighting biologically aggressive tumor subtypes.

  • High Hypoxia and high Tumor Break Load leads to worst survival.

  • Mixed groups outperform ‘low/low’, suggesting that one single adverse factor can moderate the negative effect in survival.

Survival of Radiation Therapy Patients based on levels of Hypoxia

Low oxygen levels in tumors Hypoxia are usually associated with resistance to radiation therapy.

  • Buffa Score: Higher hypoxia → unexpectedly better survival

  • Ragnum Score: Lower hypoxia → clear survival advantage

  • Winter Score: Lower hypoxia → better survival, but less pronounced than Ragnum

Variations in hypoxia scoring limit interpretation of radiotherapy survival.

:::::

Hypoxia and MSI scores

  • Scoring methods poorly agree, labeling patients and tumors inconsistently.

  • Buffa and Winter align, but others disagree on hypoxia, MSI.

  • Buffa and Ragnum scores rose with grade, especially in tumors.

Conclusion

  • Results for patients outside the 50–80 age range should be interpreted cautiously due to low sample counts.

  • Predominantly White patient representation highlights the need for greater ethnic diversity.

  • Limited variability in several categories may constrain findings.

  • Future analyses could be strengthened by adding more clinical and molecular data and increasing sample diversity, including underrepresented groups.

References

Cerami, Ethan, Jianjiong Gao, Ugur Dogrusoz, Benjamin E. Gross, Selcuk O. Sumer, Bülent A. Aksoy, Anders Jacobsen, et al. 2012. “The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data.” Cancer Discovery 2 (5): 401–4. https://doi.org/10.1158/2159-8290.CD-12-0095.
David Gutman and Lee Cooper. n.d. “The Cancer Digital Slide Archive.” https://cancer.digitalslidearchive.org.
Magnus Strange. n.d. “BioRender Figure.” https://biorender.com.
Siegel, Rebecca L., Tyler B. Kratzer, Angela N. Giaquinto, Hyuna Sung, and Ahmedin Jemal. 2025. “Cancer Statistics, 2025.” CA: A Cancer Journal for Clinicians 75 (1): 10–45. https://doi.org/https://doi.org/10.3322/caac.21871.
The Cancer Genome Atlas Research Network. 2018. “TCGA Pan-Cancer Atlas: Pancreatic Adenocarcinoma (PAAD).” https://www.cbioportal.org/study/summary?id=paad_tcga_pan_can_atlas_2018.